**FULL TITLE** 

ASP Conference Series, Vol. **VOLUME**, **YEAR OF PUBLICATION** 
**NAMES OF EDITORS** 



The Sys-Rem Detrending Algorithm: Implementation and 
Testing 



T. Mazeh 

Wise Observatory, Sackler Faculty of Exact Sciences, Tel Aviv 
, University, Tel Aviv 69978, Israel 

o . 

CsJ , 0. Tamuz 

O [ Wise Observatory, Sackler Faculty of Exact Sciences, Tel Aviv 

^ ■ University, Tel Aviv 69978, Israel 

■ S. Zucker 

Department of Geophysics and Planetary Sciences, Sackler Faculty of 
Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel 

> \ 

Abstract. Sys-Rem (Tamuz, Mazeh & Zucker 2005) is a detrending algo- 
rithm designed to remove systematic effects in a large set of lightcurves obtained 
by a photometric survey. The algorithm works without any prior knowledge of 
the effects, as long as they appear in many stars of the sample. This paper 
presents the basic principles of Sys-Rem and discusses a parameterization used 
to determine the number of effects removed. We assess the performance of Sys- 
Rem on simulated transits injected into WHAT survey data. This test is pro- 

■ posed as a general scheme to assess the effectiveness of detrending algorithms. 
Onj Application of Sys-Rem to the OGLE dataset demonstrates the power of the 
q . algorithm. We offer a coded implementation of Sys-Rem to the community. 

> 

^ 1 1. Introduction 

Since the discovery that the planet orbiting HP 2 09458 ( Mazeh et alj|2000j) tran- 



O 



sits the disk of its host star ( Charbonneau et al. 20001 ; Henry et al.ll200ofh man y 



photometric searches for transits have been put into operation (e.g. . lHornell2003l ) . 
However, till September 2006 the yield of these searches was surprisingly small. 
Only the realization that systematic effects and red noise ( Pont. Zucker Queloj 



2006) are an impediment to transit detection ex plained why many searc h es de 



tected less planets than expected. The work of iPont. Zucker Queloj ( 20061 ) 



sharpened the need to account for the pr esence of red noise in the survey data. 
Sys-Rem ( Tamuz. Mazeh &; Zuckeil 1200 5) . an algorithm to remove systematic 



effects in large sets of lightcurves obtained by photometric surveys, is designed 
exactly to answer this need. The algorithm can detect any effect that appears 
linearly in many lightcurves obtained by the survey. Recently, Sys-Rem, to 



gethe r with other detrending algorithms such as TFA (jKovacs. Bakos &: Noves 
120051 ). have become standard tools in transit survey lightcurv es processing, con- 
tributing already to the recent detection of several transits ( Bakos et al. 20061 ; 



1 



2 



Mazeh, Tamuz & Zucker 



Collier Cameron et alJl2006l ). This paper discusses the implementation of Sys- 



Rem and suggests a way to assess its performance. 

Section 2 reviews the principles of Sys-Rem and Section 3 presents our stop- 
ping criterion, a parametrization to determine the number of effects to remove. 
In Section 4 we propose a test to assess the effectiveness of detrending algorithms 
and apply it to Sys-Rem. Section 5 discusses the application of the algorithm 
to the OGLE survey. We conclude with some remarks. 



2. The principle of Sys-Rem 

We first started to develop our algorithm in an atte mpt to correct for atmo- 



spheri c extinction, with an approach similar to that of iKruszewski Sz Semeniuk 



(2003). We derived the best-fitting airmasses of the different images and the ex- 



tinction coefficients of the different stars, without having any prior information 
on the stellar colours. However, the final result is a general algorithm to deal 
with any linear systematic effects. In some restricted cases, when one can ignore 
the different uncertainties of the data points, this algorithm r educes to the well- 
known Principal Component Analysis (Murtagir^£ec3[l987], Ch. 2). However, 



when the uncertainties of the measurements vary substantially, as is the case in 
many photometric surveys, PCA performs poorly relative to Sys-Rem. 

The principles of Sys-Rem can be easily explained using the original prob- 
lem we tried to solve. Colour-dependent atmospheric extinction is an obvious 
observational effect that contaminates ground-based photometric measurements. 
This effect depends on stellar colours, which are not always known. To correct for 
the atmospheric extinction one can find the effective colour of each star, which 
characterizes its variation as a function of the airmass of the measurements. 

Specifically, consider a set of N lightcurves, each of which is derived from M 
images. Define the residual of each observation, r^, to be the average-subtracted 
stellar magnitude of the i-th star derived from the j'-th image, taken at the 
airmass dj . We can then define the effective extinction coefficient Cj of star i to 
be the slope of the best linear fit to the residuals of this star as a function of 
the corresponding airmasses, aiming to remove the product QOj from each rij. 
In fact, we search for the best Cj that minimizes the expression 



v 

where &ij is the uncertainty of r«. Note that the derivation of each Cj is inde- 
pendent of all the other Cj's, but does depend on all the Oj 's. 

The problem can now be turned around. Since atmospheric extinction 
might depend not only on the airmass but also on weather conditions, we can 
ask ourselves what is the best estimate of the airmass of each image, given the 
known effective colour of each star. Thus, we can look for the a,- that minimizes 
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given the previously calculated set of {q}. We can now recalculate new best- 
fitting coefficients, q, for every star, based on the new {cij}, and continue iter- 
atively. We thus have an iterative process which in essence searches for the two 
sets - {c,} and {aj}, that best account for the atmospheric extinction. 

Many simulations have shown that this iterative process converged to the 
same {a,j} and {c~j}, no matter what initial values were used. Therefore, we 
suggest that the proposed algorithm can find the most suitable effective airmass 
of each image and the extinction coefficient of each star. 

The algorithm, in fact, finds the best two sets of {cj ; i = 1, N} and {aj ; j = 
1, M} that minimize the global expression 



Ther efore, although the alternating 'criss-cross' iteration process (jGabriel Zamir 



19791 ) started with the actual airmasses of the different images, the values of the 



final set of parameters {<ij} and {q} are not necessarily related to the true 
airmass and extinction coefficient. They are merely the variables by which the 
global sum of residuals, S 2 , varies linearly most significantly. They could rep- 
resent any strong systematic effect that might be associated, for example, with 
time, temperature or position on the CCD. This algorithm finds the systematic 
effect as long as the global minimum of S 2 is achieved. 

Now, suppose the data are affected by a few different systematic effects, 
with different {q} and {aj}. Sys-Rem can be applied repeatedly, until it finds 
no more significant linear effects in the residuals. 



3. The halting problem 

Formally, the process of identifying additional 'systematic' effects in any set of 
lightcurves can be applied till there is no variation left in all lightcurves. To 
prevent such a situation, it is obvious that Sys-Rem needs a stopping criterion, 
which will enable it to remove the strong systematic effects in the data without 
removing the signal of the variable stars, the transit signals in particular. 

Our stopping criterion is based on a measure of the strength of each effect 
in each lightcurve. We therefore define (3 as the fractional r.m.s. removed by 
subtracting the effect from a specific lightcurve. We assume that a significant 
effect yields a large f3. Note that (3 is defined independently for each lightcurve 
and each effect. Our stopping mechanism involves choosing f3 m i n , so that we 
apply Sys-Rem subtraction only to effects and lightcurves with (3 > (3 m in- 

In order to estimate (3 m i n we use the value of (3 found in a set of ran- 
domly generated lightcurves of similar noise structure as the real ones. For each 
lightcurve in a given dataset we generate a corresponding random lightcurve 
with a randomization technique, by which we keep the stellar intensities but 
randomly permute the timing on all measurements. Such a procedure should 
get rid of any correlated noise hidden in the original lightcurves, while keeping 
the level of the white noise. We then apply Sys-Rem to the entire set of false 
lightcurves, find a Sys-Rem effect in this randomized matrix, and calculate for 
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each false lightcurve its (5 value. We use the distribution of the (5 values of the 
false ligthcurves to derive Pmin, which is the (3 value for which a fraction a of 
the random lightcurves have smaller values of (5. Therefore, /3 m j„ is a monotonic 
increasing function of a. Choosing a of say, 0.9, means removing only effects 
that are stronger than 90 percent of all random effects. In Section 4 we try 
various values of a and find that 0.9 is indeed a good choice for the WHAT 
dataset. 



4. Assessing the performance of a detrending algorithm 

We propose here a general scheme to test the performance of detrending algo- 
rithms. The test is performed on simulated data which have been generated 
by injecting simulated transit signals into real data. The test is conducted by 
applying the detrending algorithm to the simulated data and then searching for 
transits. A transit search applied after an effective detrending algorithm should 
detect a large fraction of the injected transits, and should not yield many 'false 
positive' transits that had not been injected into the data. 

The test proposed here deserves two comments at this stage. The first is 
related to the assumption that the real data do not include many real transits. 
Even an accurate set of lightcurves can include at most only very few transits, 
so the evaluation of the test should not be thwarted by the presence of real 
transits. The second comment has to do with the fact that the test we propose 
really checks the effectiveness of the detrending algorithm together with the 
transit detection technique, applied to a specific dataset. Therefore, this test 
does not assess the overall performance of a detrending algorithm, but only its 
usefulness when applied to a specific dataset and used in conjunction with a 
particular transit detection algorithm. 

The detection of a transit candidate in a set of lightcurves is never an 
absolute result, and each transit candidate is likely to be a false positive with 
some probability. Therefore, any transit survey inevitably outputs a list of 
transit candidates, prioritized according to some statistic. Thus, we expect an 
effective detrending algorithm, Sys-Rem or other, to increase the priority of the 
real transits in the candidate list, reducing the number of false positives and 
making follow-up more efficient. 

To test how well Sys-Rem performs we injected simulated transit s ignals 
into some of the lightcurves of WHAT fi eld no. 236 (IShporer et alJl2006h . We 



ran the BLS transit detection algorithm ( Kovacs. Zucker &; Mazeh 20021 ) on all 



lightcurves, including the ones with no transit signal, and constructed a can- 
didate list. We repeated this procedure after applying Sys-Rem, with three 
different values of a. 

Fig. Q] shows the fraction of injected transits that were detected by BLS and 
appear at the top of the candidate list. The detection fraction of the injected 
transits depends on how far down the list of candidates one goes. Therefore, 
the figure presents the detection fraction as a function of the number of false 
detections included in the top of the list. In other words, the figure shows what 
fraction of the simulated injected transits one could detect if he is ready to 
include a given number of false positive cases. The higher the graph is, the 
higher is the probability to include the transit in the top of the list. 
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Figure 1.: Fraction of transits detected vs. number of false detections, for 
different values of a. 



The figure shows that applying Sys-Rem with any of the three values of a 
dramatically improved the detectability of the injected transits. It seems as if 
for this specific data set and this specific set of injected transits, Sys-Rem with 
a = 0.5 is inferior to the other two options. Because of the limited observational 
resources, most of the present follow-up projects can not allow a high number 
of false positive cases, and therefore it seems that Sys-Rem with a = 0.9 is the 
best of the three options tested here. 

Note that the number of false positive transits included in the top of the 
list depends on the data, its red noise distribution in particular. Therefore the 
meaning of the figure is limited to the WHAT set of lightcurves. 



5. Application to OGLE 

We applied Sys-Rem to the photometric data collect ed by the OGLE su rvey in 
three Carina fields: CAR100, CAR104 and CAR105 (jUdalski et alJl2002l ). This 
dataset includes 1200 measurements of about one million stars. In each field, 
we applied Sys-Rem separately to each of the 8 CCD chips. 

To present the effectiveness of Sys-Rem's application to the OGLE data 
we present in Fig. [2] and Fig. [3] some results from the data of chip 8 in field 
CAR105. We selected 200 bright stars from the chip, calculated AoV peri- 
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odograms (Schwarzenberg-Czerny 1989) for each, and then averaged the peri- 
odograms to produce the upper panel of Fig. [2J The averaged periodogram 
shows clearly a periodicity of 1 day and its harmonics, and some low-frequency 
power. Obviously, this has to do with some systematic effects. 
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Figure 2.: Averaged AoV periodograms of 200 OGLE stars before (upper panel) 
and after (bottom panel) applying Sys-Rem 



We generated a similar averaged periodogram after we applied Sys-Rem, to 
produce the bottom panel. As is evident, Sys-Rem removes not only periodic 
variability with frequencies with integer number of cycles per day, but also most 
of the low-frequency variability. Note, however, that some troughs appear in 
one day harmonics, which indicate that Sys-Rem may have removed some true 
signal alo ng with the systematics, impa iring detection of transits in these fre- 
quencies. iKruszewski Sz Semeniuk ( 2003I ) got similar results when they searched 
for systematics with periodicity of one day and its harmonics. 

Fig. [3] shows, for the same chip, the fractional change in the RMS scatter 
obtained by Sys-Rem (in percentage of the initial scatter), as a function of the 
magnitude and the original scatter. The top panel of the figure shows that Sys- 
Rem is most effective in reducing the scatter of the brighter stars, where the 
systematic noise is more dominant. For those stars the improvement can get up 
to 30% of the original scatter. Note that a small but substantial improvement can 
be seen for all stars. The increase of Sys-Rem improvement for the faint stars is 
probably due to removal of systematics associated with background subtraction. 
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Figure 3.: The reduction in the RMS scatter of the OGLE lightcurves as a 
function of the magnitude (upper panel) and the initial scatter 



6. Conclusion 

We have presented a stopping criterion of Sys-Rem, based on the fraction of 
the variability subtracted from any lightcurve by removing the systematic effect. 
This power is compared with the power subtracted from random lightcurves with 
the same noise level. The comparison is parameterized, so a different threshold 
of removing fractional variability can be adopted. We assessed the performance 
of Sys-Rem on simulated transits injected into the WHAT survey dataset and 
found that Sys-Rem improved the detectability substantially. This is true for all 
three values of the stopping parameter used. We propose this test as a general 
scheme to assess the effectiveness of detrending algorithms. 

We have presented an application of Sys-Rem to the dataset of the OGLE 
transit search. We demonstrated that the algorithm can eliminate a significant 
part of the systematic noise hidden in the light curves. The mainly affected 
stars are the brighter ones, where the photon noise is less significant and the 

systematics grow in importance. 

In a previous conference we ( Mazeh et al. I [20061 ) offered the community an 



'overnight cleaning service', through which we apply our Sys-Rem code to their 
photometric data and return the data clean of systematics, ready for search for 
minute periodic variability. It turned out that most researches do not like their 
laundry to be cleaned by others. Therefore, we are now offering the community 
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to acquire their own cleaning facility: the code is available in C and can be 
obtained upon request from omert@wise.tau.ac.il. 
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